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Abstract 

Most  works  on  relation  extraction  assume 
considerable  human  effort  for  making  an 
annotated  corpus  or  for  knowledge  engi¬ 
neering.  Generic  patterns  employed  in 
KnowItAll  achieve  unsupervised,  high- 
precision  extraction,  but  often  result  in  low 
recall.  This  paper  compares  two  boot¬ 
strapping  methods  to  expand  recall  that 
start  with  automatically  extracted  seeds 
by  KnowItAll.  The  first  method  is  string 
pattern  learning,  which  learns  string  con¬ 
texts  adjacent  to  a  seed  tuple.  The  second 
method  learns  less  restrictive  patterns  that 
include  bags  of  words  and  relation- specific 
named  entity  tags.  Both  methods  improve 
the  recall  of  the  generic  pattern  method.  In 
particular,  the  less  restrictive  pattern  learn¬ 
ing  method  can  achieve  a  250%  increase 
in  recall  at  0.87  precision,  compared  to  the 
generic  pattern  method. 

1  Introduction 

Relation  extraction  is  a  task  to  extract  tu¬ 
ples  of  entities  that  satisfy  a  given  relation 
from  textual  documents.  Examples  of  rela¬ 
tions  include  CeoOf(Company,  Ceo)  and  Acquisi- 
tion(Organization,  Organization).  There  has  been 
much  work  on  relation  extraction;  most  of  it  em¬ 
ploys  knowledge  engineering  or  supervised  ma¬ 
chine  learning  approaches  (Feldman  et  al.,  2002; 
Zhao  and  Grishman,  2005).  Both  approaches  are 
labor  intensive. 

We  begin  with  a  baseline  information  extraction 
system,  KnowItAll  (Etzioni  et  al.,  2005),  that  does 
unsupervised  information  extraction  at  Web  scale. 
KnowItAll  uses  a  set  of  generic  extraction  pat¬ 
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terns,  and  automatically  instantiates  rules  by  com¬ 
bining  these  patterns  with  user  supplied  relation 
labels.  For  example,  KnowItAll  has  patterns  for  a 
generic  “of”  relation: 

NP1  ’s  <relation>  ,  NP2 

NP2  ,  <  relation  >  of  NP1 

where  NP1  and  NP2  are  simple  noun  phrases  that 
extract  values  of  argument  1  and  argument2  of  a 
relation,  and  <relation>  is  a  user-supplied  string 
associated  with  the  relation.  The  rules  may  also 
constrain  NP1  and  NP2  to  be  proper  nouns. 

If  a  user  supplies  the  relation  labels  “ceo” 
and  “chief  executive  officer”  for  the  relation 
CeoOf(Company,  Ceo),  KnowItAll  inserts  these 
labels  into  the  generic  patterns  shown  above,  to 
create  4  extraction  rules: 

NP1  ’s  ceo  ,  NP2 

NP1  's  chief  executive  officer  ,  NP2 

NP2  ,  ceo  of  NP1 

NP2  ,  chief  executive  officer  of  NP1 

The  same  generic  patterns  with  different  la¬ 
bels  can  also  produce  extraction  rules  for  a  May- 
orOf  relation  or  an  InventorOf  relation.  These 
rules  have  alternating  context  strings  (exact  string 
match)  and  extraction  slots  (typically  an  NP  or 
head  of  an  NP).  This  can  produce  rules  with  high 
precision,  but  low  recall,  due  to  the  wide  variety 
of  contexts  describing  a  relation.  This  paper  looks 
at  ways  to  enhance  recall  over  this  baseline  system 
while  maintaining  high  precision. 

To  enhance  recall,  we  employ  bootstrapping 
techniques  which  start  with  seed  tuples,  i.e.  the 
most  frequently  extracted  tuples  by  the  baseline 
system.  The  first  method  represents  rules  with 
three  context  strings  of  tokens  immediately  adja¬ 
cent  to  the  extracted  arguments:  a  left  context, 
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middle  context,  and  right  context.  These  are  in¬ 
duced  from  context  strings  found  adjacent  to  seed 
tuples. 

The  second  method  uses  a  less  restrictive  pat¬ 
tern  representation  such  as  bag  of  words,  similar 
to  that  of  SnowBall(Agichtein,  2005).  SnowBall  is 
a  semi-supervised  relation  extraction  system.  The 
input  of  Snowball  is  a  few  hand  labeled  correct 
seed  tuples  for  a  relation  (e.g.  <Microsoft,  Steve 
Ballmer>  for  CeoOf  relation).  SnowBall  clusters 
the  bag  of  words  representations  generated  from 
the  context  strings  adjacent  to  each  seed  tuple,  and 
generates  rules  from  them.  It  calculates  the  confi¬ 
dence  of  candidate  tuples  and  the  rules  iteratively 
by  using  an  EM-algorithm.  Because  it  can  extract 
any  tuple  whose  entities  co-occur  within  a  win¬ 
dow,  the  recall  can  be  higher  than  the  string  pat¬ 
tern  learning  method.  The  main  disadvantage  of 
SnowBall  or  a  method  which  employs  less  restric¬ 
tive  patterns  is  that  it  requires  Named  Entity  Rec¬ 
ognizer  (NER). 

We  introduce  Relation-dependent  NER  (Rela¬ 
tion  NER),  which  trains  an  off-the-shelf  super¬ 
vised  NER  based  on  CRF(Lafferty  et  al.,  2001) 
with  bootstrapping.  This  learns  relation-specific 
NE  tags,  and  we  present  a  method  to  use  these  tags 
for  relation  extraction. 

This  paper  compares  the  following  two  boot¬ 
strapping  strategies. 

SPL:  a  simple  string  pattern  learning  method.  It 
learns  string  patterns  adjacent  to  a  seed  tuple. 

LRPL:  a  less  restrictive  pattern  learning  method. 
It  learns  a  variety  of  bag  of  words  patterns, 
after  training  a  Relation  NER. 

Both  methods  are  completely  self-supervised  ex¬ 
tensions  to  the  unsupervised  KnowItAll.  A  user 
supplies  KnowItAll  with  one  or  more  relation  la¬ 
bels  to  be  applied  to  one  or  more  generic  extrac¬ 
tion  patterns.  No  further  tagging  or  manual  selec¬ 
tion  of  seeds  is  required.  Each  of  the  bootstrapping 
methods  uses  seeds  that  are  automatically  selected 
from  the  output  of  the  baseline  KnowItAll  system. 

The  results  show  that  both  bootstrapping  meth¬ 
ods  improve  the  recall  of  the  baseline  system.  The 
two  methods  have  comparable  results,  with  LRPL 
outperforms  SPL  for  some  relations  and  SPL  out¬ 
performs  LRPL  for  other  relations. 

The  rest  of  the  paper  is  organized  as  follows. 
Section  2  and  3  describe  SPL  and  LRPL  respec¬ 
tively.  Section  4  reports  on  our  experiments,  and 


section  5  and  6  describe  related  works  and  conclu¬ 
sions. 

2  String  Pattern  Learning  (SPL) 

Both  SPL  and  LRPL  start  with  seed  tuples  that 
were  extracted  by  the  baseline  KnowItAll  system, 
with  extraction  frequency  at  or  above  a  threshold 
(set  to  2  in  these  experiments).  In  these  experi¬ 
ments,  we  downloaded  a  set  of  sentences  from  the 
Web  that  contained  an  occurrence  of  at  least  one 
relation  label  and  used  this  as  our  reservoir  of  un¬ 
labeled  training  and  test  sentences.  We  created  a 
set  of  positive  training  sentences  from  those  sen¬ 
tences  that  contained  both  argument  values  of  a 
seed  tuple. 

SPL  employs  a  method  similar  to  that  of 
(Downey  et  al.,  2004).  It  generates  candidate  ex¬ 
traction  rules  with  a  prefix  context,  a  middle  con¬ 
text,  and  a  right  context.  The  prefix  is  zero  to  Lsuje 
tokens  immediately  to  the  left  of  extracted  argu¬ 
ment  1,  the  middle  context  is  all  tokens  between 
argument  1  and  argument2,  and  the  right  context  of 
zero  to  Lsuje  tokens  immediately  to  the  right  of  ar- 
gument2.  It  discards  patterns  with  more  than  Lmu( 
intervening  tokens  or  without  a  relation  label. 

SPL  tabulates  the  occurrence  of  such  patterns 
in  the  set  of  positive  training  sentences  (all  sen¬ 
tences  from  the  reservoir  that  contain  both  argu¬ 
ment  values  from  a  seed  tuple  in  either  order),  and 
also  tabulates  their  occurrence  in  negative  training 
sentences.  The  negative  training  are  sentences  that 
have  one  argument  value  from  a  seed  tuple  and  a 
nearest  simple  NP  in  place  of  the  other  argument 
value.  This  idea  is  based  on  that  of  (Ravichan- 
dran  and  Hovy,  2002)  for  a  QA  system.  SPL 
learns  a  possibly  large  set  of  strict  extraction  rules 
that  have  alternating  context  strings  and  extraction 
slots,  with  no  gaps  or  wildcards  in  the  rules. 

SPL  selects  the  best  patterns  as  follows: 

1 .  Groups  the  context  strings  that  have  the  exact 
same  middle  string. 

2.  Selects  the  best  pattern  having  the  largest  pat¬ 
tern  score,  PS,  for  each  group  of  context 
strings  having  the  same  middle  string. 

Pg^p'j  —  _ \Spositive(p)  \  ~b  1 _ 

\S positive  ip)  |  T  |  S negative  ip)  |  T  Ot 

(1) 

3.  Selects  the  patterns  having  PS  greater  than 

7 pattern  ■ 
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Seed  Tuples 


Unlabeled  Sentences 

_ »|  TrainSentenceGenerator  \4 _  : 

Seed  Marked  Sentences 

|  TrainEntityRecognizer  | — ^crf — J  EntityRecognizer~|  | 
[1]  .Relation  NER  j 

Entity  Annotated  Sentences 

-»|  ContextRepresentationGenerator  ^  I 

_ *  Ctrain  R  C'CSt  _ 

|  TrainConfidenceEstimator  | - >J  ConfidenceEstimatoT] 

[2]:  Relation  Assesor 


Result  Tuples 

Figure  1 :  The  architecture  of  LRPL  (Less  Restric¬ 
tive  Pattern  Learning). 

where  Spositive{p )  is  a  set  of  sentences  that  match 
pattern  p  and  include  both  argument  values  of  a 
seed  tuple.  Snegative{p)  is  a  set  of  sentences  that 
match  p  and  include  just  one  argument  value  of 
a  seed  tuple  (e.g.  just  a  company  or  a  person  for 
CeoOf).  a  is  a  constant  for  smoothing. 

3  Less  Restrictive  Pattern  Learning 
(LRPL) 

LRPL  uses  a  more  flexible  rule  representation  than 
SPL.  As  before,  the  rules  are  based  on  a  window  of 
tokens  to  the  left  of  the  first  argument,  a  window 
of  middle  tokens,  and  a  window  of  tokens  to  the 
right  of  the  second  argument.  Rather  than  using 
exact  string  match  on  a  simple  sequence  of  tokens, 
LRPL  uses  a  combination  of  bag  of  words  and  im¬ 
mediately  adjacent  token.  The  left  context  is  based 
on  a  window  of  Ls.ujc  tokens  immediately  to  the 
left  of  argument  1.  It  has  two  sets  of  tokens:  the 
token  immediately  to  the  left  and  a  bag  of  words 
for  the  remaining  tokens.  Each  of  these  sets  may 
have  zero  or  more  tokens.  The  middle  and  right 
contexts  are  similarly  defined.  We  call  this  repre¬ 
sentation  extended  bag  of  words. 

Here  is  an  example  of  how  LRPL  represents 
the  context  of  a  training  sentence  with  win¬ 
dow  size  set  to  4.  “Yesterday  ,  <Arg2>Steve 
Ballmer</Arg2>,  the  Chief  Executive  Officer  of 
<Argl>Microsoft</Argl>  said  that  he  is  ...”. 

order :  arg2_argl 

values:  Steve  Ballmer,  Microsoft 

L:  {yesterday}  {,} 

M:  {,}  {chief  executive  officer  the}  {of} 
R:  {said}  {he  is  that} 

Some  of  the  tokens  in  these  bags  of  words  may 
be  dropped  in  merging  this  with  patterns  from 


other  training  sentences.  Each  rule  also  has  a  con¬ 
fidence  score,  learned  from  EM-estimation. 

We  experimented  with  simply  using  three  bags 
of  words  as  in  SnowBall,  but  found  that  precision 
was  increased  when  we  distinguished  the  tokens 
immediately  adjacent  to  argument  values  from  the 
other  tokens  in  the  left,  middle,  and  right  bag  of 
words. 

Less  restrictive  patterns  require  a  Named  Entity 
Recognizer  (NER),  because  the  patterns  can  not 
extract  candidate  entities  by  themselves1 .  LRPL 
trains  a  supervised  NER  in  bootstrapping  for  ex¬ 
tracting  candidate  entities. 

Figure  1  overviews  LRPL.  It  consists  of  two 
bootstrapping  modules:  Relation  NER  and  Rela¬ 
tion  Assessor.  LRPL  trains  the  Relational  NER 
from  seed  tuples  provided  by  the  baseline  Know- 
ItAll  system  and  unlabeled  sentences  in  the  reser¬ 
voir.  Then  it  does  NE  tagging  on  the  sentences  to 
learn  the  less  restrictive  rules  and  to  extract  can¬ 
didate  tuples.  The  learning  and  extraction  steps  at 
Relation  Assessor  are  similar  to  that  of  SnowBall; 
it  generates  a  set  of  rules  and  uses  EM-estimation 
to  compute  a  confidence  in  each  rule.  When  these 
rules  are  applied,  the  system  computes  a  probabil¬ 
ity  for  each  tuple  based  on  the  rule  confidence,  the 
degree  of  match  between  a  sentence  and  the  rule, 
and  the  extraction  frequency. 

3.1  Relation  dependent  Named  Entity 
Recognizer 

Relation  NER  leverages  an  off-the-shelf  super¬ 
vised  NER,  based  on  Conditional  Random  Fields 
(CRF).  In  Figure  1,  TrainSentenceGenerator  auto¬ 
matically  generates  training  sentences  from  seeds 
and  unlabeled  sentences  in  the  reservoir.  TrainEn¬ 
tityRecognizer  trains  a  CRF  on  the  training  sen¬ 
tences  and  then  EntityRecognizer  applies  the 
trained  CRF  to  all  the  unlabeled  sentences,  creat¬ 
ing  entity  annotated  sentences. 

It  can  extract  entities  whose  type  matches  an  ar¬ 
gument  type  of  a  particular  relation.  The  type  is 
not  explicitly  specified  by  a  user,  but  is  automati¬ 
cally  determined  according  to  the  seed  tuples.  For 
example,  it  can  extract  ‘City’  and  ‘Mayor’  type  en¬ 
tities  for  MayorOf(City,  Mayor)  relation.  We  de¬ 
scribe  CRF  in  brief,  and  then  how  to  train  it  in 
bootstrapping. 

'Although  using  all  noun  phrases  in  a  sentence  may  be 
possible,  it  apparently  results  in  low  precision. 
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3.1.1  Supervised  Named  Entity  Recognizer 

Several  state-of-the-art  supervised  NERs  are 
based  on  a  feature-rich  probabilistic  conditional 
classifier  such  as  Conditional  Random  Fields 
(CRF)  for  sequential  learning  tasks(Lafferty  et  al., 
2001;  Rosenfeld  et  al.,  2005).  The  input  of  CRF  is 
a  feature  sequence  X  of  features  Xj,  and  outputs  a 
tag  sequence  T  of  tags  tj .  In  the  training  phrase,  a 
set  of  <  Xj  ,Ti  >  is  provided,  and  outputs  a  model 
Mcrf.  In  the  applying  phase,  given  X,  it  outputs  a 
tag  sequence  T  by  using  Mcrf.  In  the  case  of  NE 
tagging,  given  a  sequence  of  tokens,  it  automat¬ 
ically  generates  a  sequence  of  feature  sets;  each 
set  is  corresponding  to  a  token.  It  can  incorporate 
any  properties  that  can  be  represented  as  a  binary 
feature  into  the  model,  such  as  words,  capitalized 
patterns,  part-of-speech  tags  and  the  existence  of 
the  word  in  a  dictionary.  It  works  quite  well  on 
NE  tagging  tasks  (McCallum  and  Ei,  2003). 

3.1.2  How  to  Train  Supervised  NER  in 
Bootstrapping 

We  use  bootstrapping  to  train  CRF  for  relation- 
specific  NE  tagging  as  follows:  1)  select  the  sen¬ 
tences  that  include  all  the  entity  values  of  a  seed 
tuple,  2)  automatically  mark  the  argument  values 
in  each  sentence,  and  3)train  CRF  on  the  seed 
marked  sentences.  An  example  of  a  seed  marked 
sentence  is  the  following: 

seed  tuple:  <Microsoft,  Steve  Ballmer> 

seed  marked  sentence : 

"Yesterday,  <Arg2>Steve  Ballmer</Arg2 > , 

CEO  of  <Argl>Microsoft</Argl> 

announced  that  ..." 

Because  of  redundancy,  we  can  expect  to  gen¬ 
erate  a  fairly  large  number  of  seed  marked  sen¬ 
tences  by  using  a  few  highly  frequent  seed  tuples. 
To  avoid  overfitting  on  terms  from  these  seed  tu¬ 
ples,  we  substitute  the  actual  argument  values  with 
random  characters  for  each  training  sentence,  pre¬ 
serving  capitalization  patterns  and  number  of  char¬ 
acters  in  each  token. 

3.2  Relation  Assessor 

Relation  Assessor  employs  several  SnowBall-like 
techniques  including  making  rules  by  clustering 
and  EM-estimation  for  the  confidence  of  the  rules 
and  tuples. 

In  Figure  1,  ContextRepresentationGenerator 
generates  extended  bag  of  words  contexts,  from 


entity  annotated  sentences,  and  classifies  the  con¬ 
texts  into  two  classes:  training  contexts  Cirain  (if 
their  entity  values  and  their  orders  match  a  seed 
tuple)  and  test  contexts  Ctest  (otherwise).  Train- 
ConfidenceEstimator  clusters  Ctrain  based  on  the 
match  score  between  contexts,  and  generates  a 
rule  from  each  cluster,  that  has  average  vectors 
over  contexts  belonging  to  the  cluster.  Given  a  set 
of  generated  rules  R  and  test  contexts  Qest,  Confi- 
denceEstimator  estimates  each  tuple  confidence  in 
Ctest  by  using  an  EM  algorithm.  It  also  estimates 
the  confidence  of  the  tuples  extracted  by  the  base¬ 
line  system,  and  outputs  the  merged  result  tuples 
with  confidence. 

We  describe  the  match  score  calculation 
method,  the  EM-algorithm,  and  the  merging 
method  in  the  following  sub  sections. 

3.2.1  Match  Score  Calculation 

The  match  score  (or  similarity)  M  of  two  ex¬ 
tended  bag  of  words  contexts  q,  Cj  is  calculated 
as  the  lineal-  combination  of  the  cosine  values  be¬ 
tween  the  corresponding  vectors. 

M(cu  Cj)  =  Y  wstcos(cist ,  Cjst)  (2) 

s,t 

where,  s  is  the  index  of  left,  middle,  or  right  con¬ 
texts.  t  is  the  index  of  left  adjacent,  right  adjacent, 
or  other  tokens.  wst.  is  the  weight  corresponding 
to  the  context  vector  indexed  by  s  and  t. 

To  achieve  high  precision.  Relation  Assessor 
uses  only  the  entity  annotated  sentences  that  have 
just  one  entity  for  each  argument  (two  entities 
in  total)  and  where  those  entities  co-occur  within 
Lmui  tokens  window,  and  it  uses  at  most  LSU)P  left 
and  right  tokens.  It  discards  patterns  without  a  re¬ 
lation  label. 

3.2.2  EM-estimation  for  tuple  and  rule 
confidence 

Several  rules  generated  from  only  positive  ev¬ 
idence  result  in  low  precision  (e.g.  rule  “of”  for 
MayorOf  relation  generated  from  “Rudolph  Giu¬ 
liani  of  New  York”).  This  problem  can  be  im¬ 
proved  by  estimating  the  rule  confidence  by  the 
following  EM-algorithm. 

1 .  For  each  cl}  in  Ctest,  identifies  the  best  match 
rule  r*(Cij),  based  on  the  match  score  be¬ 
tween  Cjj  and  each  rule  r.  q?  is  the  j th  con¬ 
text  that  includes  tuple  tj, . 

r"{cij)  =  argmaxrM(r,ey )  (3) 
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2.  Initializes  seed  tuple  confidence,  TC(ts)  =  1 
for  all  ts,  where  ts  is  a  seed  tuple. 

3.  Calculates  tuple  confidence,  TC,  and  rule 
confidence,  RC,  by  using  EM-algorithm.  E 
and  M  stages  are  iterated  several  times. 


E  stage: 


RC(rk ) 


EuTC(tu(rk))  +  l 
numOf  Matcher  k )  +  f3 


(4) 


Table  1 :  Weights  corresponding  to  a  context  vector 

(wst)- 


adjacency 

left 

other 

right 

total 

left 

0.067 

0.133 

0.2 

context  middle 

0.24 

0.12 

0.24 

0.6 

right 

0.133 

0.067 

0.2 

M  stage: 

TC(U)  =  (5) 

1  ““  II(1  ““  ^C(r''(cij))M(r‘(c!J),c!j))(6) 

3 

where 

TC */  rk=r*(cuj)  3 j 
'  “  ^  0  otherwise 

(3  is  a  constant  for  smoothing. 

This  algorithm  assigns  a  high  confidence  to  the 
rules  that  frequently  co-occur  with  only  high  con¬ 
fident  tuples.  It  also  assigns  a  high  confidence  to 
the  tuples  that  frequently  co-occur  with  the  con¬ 
texts  that  match  high  confidence  rules. 

When  it  merges  the  tuples  extracted  by  the  base¬ 
line  system,  the  algorithm  uses  the  following  con¬ 
stant  value  for  any  context  that  matches  a  baseline 
pattern. 

RC(rb)M(rij,  c^)  =  ma -x.  RC  (r*  (c))  M  (r*  (c) ,  c ) 

(7) 

where  c,i,  denotes  the  context  of  tuple  t,  that 
matches  a  baseline  pattern,  and  t),  is  any  baseline 
pattern.  With  this  calculation,  the  confidence  of 
any  tuple  extracted  by  a  baseline  pattern  is  always 
greater  than  or  equal  to  that  of  any  tuple  that  is 
extracted  by  the  learned  rules  and  has  the  same 
frequency. 

4  Evaluation 

The  focus  of  this  paper  is  the  comparison  be¬ 
tween  bootstrapping  strategies  for  extraction,  i.e., 
string  pattern  learning  and  less  restrictive  pattern 
learning  having  Relation  NER.  Therefore,  we  first 
compare  these  two  bootstrapping  methods  with 
the  baseline  system.  Furthermore,  we  also  com¬ 
pare  Relation  NER  with  a  generic  NER,  which  is 
trained  on  a  pre-existing  hand  annotated  corpus. 


4.1  Relation  Extraction  Task 

We  compare  SPL  and  LRPL  with  the  baseline  sys¬ 
tem  on  5  relations:  Acquisition,  Merger,  CeoOf, 
MayorOf,  and  InventorOf.  We  downloaded  about 
from  100,000  to  220,000  sentences  for  each  of 
these  relations  from  the  Web,  which  contained  a 
relation  label  (e.g.  “acquisition”,  “acquired”,  “ac¬ 
quiring”  or  “merger”,  “merged”,  “merging”).  We 
used  all  the  tuples  that  co-occur  with  baseline  pat¬ 
terns  at  least  twice  as  seeds.  The  numbers  of  seeds 
are  between  33  (Acquisition)  and  289  (CeoOf). 

For  consistency,  SPL  employs  the  same  assess¬ 
ment  methods  with  LRPL.  It  uses  the  EM  algo¬ 
rithm  in  Section  3.2.2  and  merges  the  tuples  ex¬ 
tracted  by  the  baseline  system.  In  the  EM  algo¬ 
rithm,  the  match  score  M  ( p ,  t )  between  a  learned 
pattern  p  and  a  tuple  t  is  set  to  a  constant  Tmatch  ■ 

LRPL  uses  MinorThird  (Cohen,  2004)  imple¬ 
mentation  of  CRF  for  Relation  NER.  The  features 
used  in  the  experiments  are  the  lower-case  word, 
capitalize  pattern,  part  of  speech  tag  of  the  cur¬ 
rent  and  +-2  tokens,  and  the  previous  state  (tag) 
referring  to  (Minkov  et  al.,  2005;  Rosenfeld  et  al., 
2005).  The  parameters  used  for  SPL  and  LRPL 
arc  experimentally  set  as  follows:  Tpnuern  =  0.5, 
Tmatch  =  0.6,  Lm.lci  =  5,  Lgide  =  3,  Ot  =  3, 
fj  =  3  and  the  context  weights  for  LRPL  shown  in 
Table  1. 

Figure  2-6  show  the  recall-precision  curves.  We 
use  the  number  of  correct  extractions  to  serve  as 
a  surrogate  for  recall,  since  computing  actual  re¬ 
call  would  require  extensive  manual  inspection  of 
the  large  data  sets.  Compared  to  the  the  baseline 
system,  both  bootstrapping  methods  increases  the 
number  of  correct  extractions  for  almost  all  the  re¬ 
lations  at  around  80%  precision.  For  MayorOf  re¬ 
lation,  LRPL  achieves  250%  increase  in  recall  at 
0.87  precision,  while  SPL’s  precision  is  less  than 
the  baseline  system.  This  is  because  SPL  can  not 
distinguish  correct  tuples  from  the  error  tuples  that 
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ceoOf 


Figure  2:  The  recall-precision  curve  of  CeoOf  re¬ 
lation. 


mayorOf 


Figure  3:  The  recall-precision  curve  of  MayorOf 
relation. 

co-occur  with  a  short  strict  pattern,  and  that  have  a 
wrong  entity  type  value.  An  example  of  the  error 
tuples  extracted  by  SPL  is  the  following: 

Learned  Pattern:  NP1  Mayor  NP2 
Sentence : 

"When  Lord  Mayor  Clover  Moore  spoke, . . . " 
Tuple:  <Lord,  Clover  Moore> 

The  improvement  of  Acquisition  and  Merger  re¬ 
lations  is  small  for  both  methods;  the  rules  learned 
for  Merger  and  Acquisition  made  erroneous  ex¬ 
tractions  of  mergers  of  geo-political  entities,  ac¬ 
quisition  of  data,  ball  players,  languages  or  dis¬ 
eases.  For  InventorOf  relation,  LRPL  does  not 
work  well.  This  is  because  ‘Invention’  is  not  a 
proper  noun  phrase,  but  a  noun  phrase.  A  noun 
phrase  includes  not  only  nouns,  but  a  particle, 
a  determiner,  and  adjectives  in  addition  to  non¬ 
capitalized  nouns.  Our  Relation  NER  was  unable 
to  detect  regularities  in  the  capitalization  pattern 
and  word  length  of  invention  phrases. 

At  around  60%  precision,  SPL  achieves  higher 
recall  for  CeoOf  and  MayorOf  relations,  in  con¬ 


Figure  4:  The  recall-precision  curve  of  Acquisi¬ 
tion  relation. 


Figure  5 :  The  recall-precision  curve  of  Merger  re¬ 
lation. 

trast,  LRPL  achieves  higher  recall  for  Acquisition 
and  Merger.  The  reason  can  be  that  nominal  style 
relations  (CeoOf  and  MayorOf)  have  a  smaller 
syntactic  variety  for  describing  them.  Therefore, 
learned  string  patterns  are  enough  generic  to  ex¬ 
tract  many  candidate  tuples. 

4.2  Entity  Recognition  Task 

Generic  types  such  as  person,  organization,  and 
location  cover  many  useful  relations.  One  might 
expect  that  NER  trained  for  these  generic  types, 
can  be  used  for  different  relations  without  mod¬ 
ifications,  instead  of  creating  a  Relation  NER. 
To  show  the  effectiveness  of  Relation  NER,  we 
compare  Relation  NER  with  a  generic  NER 
trained  on  a  pre-existent  hand  annotated  corpus 
for  generic  types;  we  used  MUC7  train,  dry-run 
test,  and  formal-test  docu mcntsfTable  2)  (Chin- 
chor,  1997).  We  also  incorporate  the  following 
additional  knowledge  into  the  CRF’s  features  re¬ 
ferring  to  (Minkov  et  al.,  2005;  Rosenfeld  et  al., 
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Figure  6:  The  recall-precision  curve  of  InventorOf 
relation. 


Table  2:  The  number  of  entities  and  unique  entities 
in  MUC7  corpus.  The  number  of  documents  is 
225.  _ 


entity 

all 

uniq 

Organization 

3704 

993 

Person 

2120 

1088 

Location 

2912 

692 

2005):  first  and  last  names,  city  names,  corp  des¬ 
ignators,  company  words  (such  as  “technology”), 
and  small  size  lists  of  person  title  (such  as  “mr.”) 
and  capitalized  common  words  (such  as  “Mon¬ 
day”).  The  base  features  for  both  methods  are  the 
same  as  the  ones  described  in  Section  4.1. 

The  ideal  entity  recognizer  for  relation  extrac¬ 
tion  is  recognizing  only  entities  that  have  an  ar¬ 
gument  type  for  a  particular  relation.  Therefore, 
a  generic  test  set  such  as  MUC7  Named  Entity 
Recognition  Task  can  not  be  used  for  our  evalu¬ 
ation.  We  randomly  selected  200  test  sentences 
from  our  dataset  that  had  a  pair  of  correct  enti¬ 
ties  for  CeoOf  or  MayorOf  relations,  and  were  not 
used  as  training  for  the  Relation  NER.  We  mea¬ 
sured  the  accuracy  as  follows. 


Recall  arg 


Precisionarg 


Etrue,arg  \ 


|  E extracted  FI  Eti 
\E extracted 


where,  Etrue^arg  is  a  set  of  true  entities  that  have 
an  argument  type  of  a  target  relation.  Eextracteci  is 
a  set  of  entities  extracted  as  an  argument. 

Because  Relation  NER  is  trained  for  argument 
types  (such  as  ‘Mayor’),  and  the  generic  NER  is 
trained  for  generic  types  (such  as  person),  this  cal¬ 


culation  is  in  favor  of  Relation  NER.  For  fair  com¬ 
parison,  we  also  use  the  following  measure. 


Precision 


\E extracted  FI  Efrue, generic \ 


generic 


|  Eextr  acted  \ 


(10) 


where,  Etrue^generic  is  a  set  of  true  entities  that 
have  a  generic  type  2 . 

Table  3  shows  that  the  Relation  NER  consis¬ 
tently  works  better  than  the  generic  NER,  even 
when  additional  knowledge  much  improved  the 
recall.  This  suggests  that  training  a  Relation  NER 
for  each  particular  relation  in  bootstrapping  is  bet¬ 
ter  approach  than  using  a  NER  trained  for  generic 
types. 


5  Related  Work 

SPL  is  a  similar  approach  to  DIPRE  (Brin,  1998) 
DIPRE  uses  a  pre-defined  simple  regular  expres¬ 
sion  to  identify  argument  values.  Therefore,  it  can 
also  suffer  from  the  type  error  problem  described 
above.  LRPL  avoids  this  problem  by  using  the  Re¬ 
lation  NER. 

LRPL  is  similar  to  SnowBall(Agichtein,  2005), 
which  employs  a  generic  NER,  and  reported  that 
most  emors  come  from  NER  errors.  Because  our 
evaluation  showed  that  Relation  NER  works  better 
than  generic  NER,  a  combination  of  Relation  NER 
and  SnowBall  can  make  a  better  result  in  other  set¬ 
tings.  3 

(Collins  and  Singer,  1999)  and  (Jones,  2005) 
describe  self-training  and  co-training  methods  for 
Named  Entity  Classification.  However,  the  prob¬ 
lem  of  NEC  task,  where  the  boundary  of  entities 
are  given  by  NP  chunker  or  parser,  is  different 
from  NE  tagging  task.  Because  the  boundary  of  an 
entity  is  often  different  from  a  NP  boundary,  the 
technique  can  not  be  used  for  our  purpose;  “Mi¬ 
crosoft  CEO  Steve  Ballmer”  is  tagged  as  a  single 
noun  phrase. 

6  Conclusion 

This  paper  describes  two  bootstrapping  strategies, 
SPL,  which  learns  simple  string  patterns,  and 
LRPL,  which  trains  Relation  NER  and  uses  it  with 
less  restrictive  patterns.  Evaluations  showed  both 

2  Although  Recall  generic  can  be  defined  in  the  same  way, 
we  did  not  use  it,  because  of  our  purpose  and  much  effort 
needed  for  complete  annotation  for  generic  types. 

3  Of  course,  further  study  needed  for  investigating  whether 
Relation  NER  works  with  a  smaller  number  of  seeds. 
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Table  3:  The  argument  precision  and  recall  is  the  average  over  all  arguments  for  CeoOf,  and  MayorOf 
relations.  The  Location  is  for  MayorOf,  Organization  is  for  CeoOf,  and  person  is  the  average  of  both 
relations. _ 


Argument 

Location 

Organization 

Person 

Recall 

Precision 

F 

Precision 

Precision 

Precision 

R-NER 

0.650 

0.912 

0.758 

0.922 

0.906 

0.955 

G-NER 

0.392 

0.663 

0.492 

0.682 

0.790 

0.809 

G-NER+dic 

0.577 

0.643 

0.606 

0.676 

0.705 

0.842 

methods  enhance  the  recall  of  the  baseline  sys¬ 
tem  for  almost  all  the  relations.  For  some  rela¬ 
tions,  SPL  and  LRPL  have  comparable  recall  and 
precision.  For  InventorOf,  where  the  invention  is 
not  a  named  entity,  SPL  performed  better,  because 
its  patterns  are  based  on  noun  phrases  rather  than 
named  entities. 

LRPL  works  better  than  SPL  for  MayorOf  re¬ 
lation  by  avoiding  several  errors  caused  by  the  tu¬ 
ples  that  co-occur  with  a  short  strict  context,  but 
have  a  wrong  type  entity  value.  Evaluations  also 
showed  that  Relation  NER  works  better  than  the 
generic  NER  trained  on  MUC7  corpus  with  addi¬ 
tional  dictionaries. 
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